Goto

Collaborating Authors

 channel information


Synesthesia of Machines (SoM)-Based Task-Driven MIMO System for Image Transmission

arXiv.org Artificial Intelligence

--T o support cooperative perception (CP) of networked mobile agents in dynamic scenarios, the efficient and robust transmission of sensory data is a critical challenge. Deep learning-based joint source-channel coding (JSCC) has demonstrated promising results for image transmission under adverse channel conditions, outperforming traditional rule-based codecs. While recent works have explored to combine JSCC with the widely adopted multiple-input multiple-output (MIMO) technology, these approaches are still limited to the discrete-time analog transmission (DT A T) model and simple tasks. Given the limited performance of existing MIMO JSCC schemes in supporting complex CP tasks for networked mobile agents with digital MIMO communication systems, this paper presents a Synesthesia of Machines (SoM)-based task-driven MIMO system for image transmission, referred to as SoM-MIMO. By leveraging the structural properties of the feature pyramid for perceptual tasks and the channel properties of the closed-loop MIMO communication system, SoM-MIMO enables efficient and robust digital MIMO transmission of images. Experimental results have shown that compared with two JSCC baseline schemes, our approach achieves average mAP improvements of 6.30 and 10.48 across all SNR levels, while maintaining identical communication overhead. N the era of beyond fifth generation (B5G) and sixth generation (6G), a large number of mobile agents, including autonomous vehicles, unmanned aerial vehicles, and humanoid robots, etc., will interact in real-time and execute diverse intelligent functions, revolutionizing industries and daily life. To enable diverse intelligent functionalities, such as decision-making and task execution, accurate environmental perception--encompassing the acquisition of object position, size, and category--is essential. Manuscript received 24 April 2025; revised 20 July 2025; accepted 26 August 2025. This work was supported in part by the by the National Natural Science Foundation of China under Grant 62125101, Grant 62341101, and Grant 62271351; in part by the New Cornerstone Science Foundation through the XPLORER PRIZE. Rongqing Zhang is with Intelligent Transportation Thrust, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China (email: rongqingz@tongji.edu.cn).


Conditional Denoising Diffusion for ISAC Enhanced Channel Estimation in Cell-Free 6G

arXiv.org Artificial Intelligence

Cell-free Integrated Sensing and Communication (ISAC) aims to revolutionize 6th Generation (6G) networks. By combining distributed access points with ISAC capabilities, it boosts spectral efficiency, situational awareness, and communication reliability. Channel estimation is a critical step in cell-free ISAC systems to ensure reliable communication, but its performance is usually limited by challenges such as pilot contamination and noisy channel estimates. This paper presents a novel framework leveraging sensing information as a key input within a Conditional Denoising Diffusion Model (CDDM). In this framework, we integrate CDDM with a Multimodal Transformer (MMT) to enhance channel estimation in ISAC-enabled cell-free systems. The MMT encoder effectively captures inter-modal relationships between sensing and location data, enabling the CDDM to iteratively denoise and refine channel estimates. Simulation results demonstrate that the proposed approach achieves significant performance gains. As compared with Least Squares (LS) and Minimum Mean Squared Error (MMSE) estimators, the proposed model achieves normalized mean squared error (NMSE) improvements of 8 dB and 9 dB, respectively. Moreover, we achieve a 27.8% NMSE improvement compared to the traditional denoising diffusion model (TDDM), which does not incorporate sensing channel information. Additionally, the model exhibits higher robustness against pilot contamination and maintains high accuracy under challenging conditions, such as low signal-to-noise ratios (SNRs). According to the simulation results, the model performs well for users near sensing targets by leveraging the correlation between sensing and communication channels.


EM-Net: Gaze Estimation with Expectation Maximization Algorithm

arXiv.org Artificial Intelligence

Appearance-based gaze estimation methods use has gradually improved, but existing methods often full-face or eye images to infer gaze direction. Appearancebased rely on large datasets or large models to improve methods can be categorized into two types based on performance, which leads to high demands on computational the input data: eye image-based [10, 11, 12] and full-face resources. In terms of this issue, this paper proposes image-based [13, 14, 15, 16]. Eye image-based methods a lightweight gaze estimation model EM-Net based on use either monocular or binocular images as input data, and deep learning and traditional machine learning algorithms since the direction of gaze is closely related to the head Expectation Maximization algorithm. First, the proposed pose, these methods usually need to concatenate the head Global Attention Mechanism (GAM) is added to extract pose information after extracting the eye features to compute features related to gaze estimation to improve the model's the final gaze direction. The full-face image-based ability to capture global dependencies and thus improve its methods directly use face images as input, which contains performance. Second, by learning hierarchical feature representations more features related to gaze estimation and can effectively through the EM module, the model has strong improve the performance of the model. Therefore, this generalization ability, which reduces the need for sample method has gradually become a research hotspot.


Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

arXiv.org Artificial Intelligence

Recent synthetic speech detectors leveraging the Transformer model have superior performance compared to the convolutional neural network counterparts. This improvement could be due to the powerful modeling ability of the multi-head self-attention (MHSA) in the Transformer model, which learns the temporal relationship of each input token. However, artifacts of synthetic speech can be located in specific regions of both frequency channels and temporal segments, while MHSA neglects this temporal-channel dependency of the input sequence. In this work, we proposed a Temporal-Channel Modeling (TCM) module to enhance MHSA's capability for capturing temporal-channel dependencies. Experimental results on the ASVspoof 2021 show that with only 0.03M additional parameters, the TCM module can outperform the state-of-the-art system by 9.25% in EER. Further ablation study reveals that utilizing both temporal and channel information yields the most improvement for detecting synthetic speech.


A Novel Cross-band CSI Prediction Scheme for Multi-band Fingerprint based Localization

arXiv.org Artificial Intelligence

Because of the advantages of computation complexity compared with traditional localization algorithms, fingerprint based localization is getting increasing demand. Expanding the fingerprint database from the frequency domain by channel reconstruction can improve localization accuracy. However, in a mobility environment, the channel reconstruction accuracy is limited by the time-varying parameters. In this paper, we proposed a system to extract the time-varying parameters based on space-alternating generalized expectation maximization (SAGE) algorithm, then used variational auto-encoder (VAE) to reconstruct the channel state information on another channel. The proposed scheme is tested on the data generated by the deep-MIMO channel model. Mathematical analysis for the viability of our system is also shown in this paper.


Graph Neural Networks for Physical-Layer Security in Multi-User Flexible-Duplex Networks

arXiv.org Artificial Intelligence

This paper explores Physical-Layer Security (PLS) in Flexible Duplex (FlexD) networks, considering scenarios involving eavesdroppers. Our investigation revolves around the intricacies of the sum secrecy rate maximization problem, particularly when faced with coordinated and distributed eavesdroppers employing a Minimum Mean Square Error (MMSE) receiver. Our contributions include an iterative classical optimization solution and an unsupervised learning strategy based on Graph Neural Networks (GNNs). To the best of our knowledge, this work marks the initial exploration of GNNs for PLS applications. Additionally, we extend the GNN approach to address the absence of eavesdroppers' channel knowledge. Extensive numerical simulations highlight FlexD's superiority over Half-Duplex (HD) communications and the GNN approach's superiority over the classical method in both performance and time complexity.


Hierarchical Multi-Agent Multi-Armed Bandit for Resource Allocation in Multi-LEO Satellite Constellation Networks

arXiv.org Artificial Intelligence

Low Earth orbit (LEO) satellite constellation is capable of providing global coverage area with high-rate services in the next sixth-generation (6G) non-terrestrial network (NTN). Due to limited onboard resources of operating power, beams, and channels, resilient and efficient resource management has become compellingly imperative under complex interference cases. However, different from conventional terrestrial base stations, LEO is deployed at considerable height and under high mobility, inducing substantially long delay and interference during transmission. As a result, acquiring the accurate channel state information between LEOs and ground users is challenging. Therefore, we construct a framework with a two-way transmission under unknown channel information and no data collected at long-delay ground gateway. In this paper, we propose hierarchical multi-agent multi-armed bandit resource allocation for LEO constellation (mmRAL) by appropriately assigning available radio resources. LEOs are considered as collaborative multiple macro-agents attempting unknown trials of various actions of micro-agents of respective resources, asymptotically achieving suitable allocation with only throughput information. In simulations, we evaluate mmRAL in various cases of LEO deployment, serving numbers of users and LEOs, hardware cost and outage probability. Benefited by efficient and resilient allocation, the proposed mmRAL system is capable of operating in homogeneous or heterogeneous orbital planes or constellations, achieving the highest throughput performance compared to the existing benchmarks in open literature.


Interference and noise cancellation for joint communication radar (JCR) system based on contextual information

arXiv.org Artificial Intelligence

This paper examines the separation of wireless communication and radar signals, thereby guaranteeing cohabitation and acting as a panacea to spectrum sensing. First, considering that the channel impulse response was known by the receivers (communication and radar), we showed that the optimizing beamforming weights mitigate the interference caused by signals and improve the physical layer security (PLS) of the system. Furthermore, when the channel responses were unknown, we designed an interference filter as a low-complex noise and interference cancellation autoencoder. By mitigating the interference on the legitimate users, the PLS was guaranteed. Results showed that even for a low signal-to-noise ratio, the autoencoder produces low root-mean-square error (RMSE) values.


Deep Learning Based RIS Channel Extrapolation with Element-grouping

arXiv.org Artificial Intelligence

Reconfigurable intelligent surface (RIS) is considered as a revolutionary technology for future wireless communication networks. In this letter, we consider the acquisition of the cascaded channels, which is a challenging task due to the massive number of passive RIS elements. To reduce the pilot overhead, we adopt the element-grouping strategy, where each element in one group shares the same reflection coefficient and is assumed to have the same channel condition. We analyze the channel interference caused by the element-grouping strategy and further design two deep learning based networks. The first one aims to refine the partial channels by eliminating the interference, while the second one tries to extrapolate the full channels from the refined partial channels. We cascade the two networks and jointly train them. Simulation results show that the proposed scheme provides significant gain compared to the conventional element-grouping method without interference elimination.


Channel-Aware Adversarial Attacks Against Deep Learning-Based Wireless Signal Classifiers

arXiv.org Machine Learning

This paper presents channel-aware adversarial attacks against deep learning-based wireless signal classifiers. There is a transmitter that transmits signals with different modulation types. A deep neural network is used at each receiver to classify its over-the-air received signals to modulation types. In the meantime, an adversary transmits an adversarial perturbation (subject to a power budget) to fool receivers into making errors in classifying signals that are received as superpositions of transmitted signals and adversarial perturbations. First, these evasion attacks are shown to fail when channels are not considered in designing adversarial perturbations. Then realistic attacks are presented by considering channel effects from the adversary to each receiver. After showing that a channel-aware attack is selective (i.e., it affects only the receiver whose channel is considered in the perturbation design), a broadcast adversarial attack is presented by crafting a common adversarial perturbation to simultaneously fool classifiers at different receivers. The major vulnerability of modulation classifiers to over-the-air adversarial attacks is shown by accounting for different levels of information available about channel, transmitter input, and classifier model. Finally, a certified defense based on randomized smoothing that augments training data with noise is introduced to make modulation classifier robust to adversarial perturbations.